Overview

Dataset statistics

Number of variables15
Number of observations23040
Missing cells17
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.6 MiB
Average record size in memory120.0 B

Variable types

Numeric9
Categorical6

Warnings

tran_date has a high cardinality: 1129 distinct values High cardinality
dob has a high cardinality: 3987 distinct values High cardinality
Unnamed: 0 is uniformly distributed Uniform
Unnamed: 0 has unique values Unique

Reproduction

Analysis started2021-02-02 16:49:07.970614
Analysis finished2021-02-02 16:49:21.175337
Duration13.2 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

transaction_id
Real number (ℝ≥0)

Distinct20878
Distinct (%)90.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.006955222 × 1010
Minimum3268991
Maximum9.998754963 × 1010
Zeros0
Zeros (%)0.0%
Memory size180.1 KiB
2021-02-02T17:49:21.285145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum3268991
5-th percentile5090661728
Q12.493314715 × 1010
median5.009188013 × 1010
Q37.532631777 × 1010
95-th percentile9.508100653 × 1010
Maximum9.998754963 × 1010
Range9.998428064 × 1010
Interquartile range (IQR)5.039317062 × 1010

Descriptive statistics

Standard deviation2.898061538 × 1010
Coefficient of variation (CV)0.5788071612
Kurtosis-1.209031909
Mean5.006955222 × 1010
Median Absolute Deviation (MAD)2.520770895 × 1010
Skewness0.001602727141
Sum1.153602483 × 1015
Variance8.398760679 × 1020
MonotocityNot monotonic
2021-02-02T17:49:21.430101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.226393808 × 10104
 
< 0.1%
31308897933
 
< 0.1%
5.065162125 × 10103
 
< 0.1%
1.135553904 × 10103
 
< 0.1%
3.204326498 × 10103
 
< 0.1%
1.46940352 × 10103
 
< 0.1%
4.150935019 × 10103
 
< 0.1%
5.870764126 × 10103
 
< 0.1%
5.580834711 × 10103
 
< 0.1%
6.29193547 × 10103
 
< 0.1%
Other values (20868)23009
99.9%
ValueCountFrequency (%)
32689911
< 0.1%
70732441
< 0.1%
108613591
< 0.1%
157410261
< 0.1%
161653591
< 0.1%
ValueCountFrequency (%)
9.998754963 × 10101
< 0.1%
9.998675162 × 10102
< 0.1%
9.998512147 × 10101
< 0.1%
9.996777519 × 10101
< 0.1%
9.996351651 × 10102
< 0.1%

Unnamed: 0
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct23040
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11525.45412
Minimum0
Maximum23052
Zeros1
Zeros (%)< 0.1%
Memory size180.1 KiB
2021-02-02T17:49:21.560898image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1151.95
Q15761.75
median11525.5
Q317290.25
95-th percentile21900.05
Maximum23052
Range23052
Interquartile range (IQR)11528.5

Descriptive statistics

Standard deviation6655.799472
Coefficient of variation (CV)0.5774869606
Kurtosis-1.200281
Mean11525.45412
Median Absolute Deviation (MAD)5764.5
Skewness0.0001726856606
Sum265546463
Variance44299666.61
MonotocityStrictly increasing
2021-02-02T17:49:21.828093image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
26921
 
< 0.1%
67901
 
< 0.1%
47431
 
< 0.1%
190841
 
< 0.1%
170371
 
< 0.1%
211351
 
< 0.1%
108961
 
< 0.1%
88491
 
< 0.1%
149941
 
< 0.1%
Other values (23030)23030
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
ValueCountFrequency (%)
230521
< 0.1%
230511
< 0.1%
230501
< 0.1%
230491
< 0.1%
230481
< 0.1%

cust_id
Real number (ℝ≥0)

Distinct5506
Distinct (%)23.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean271021.8803
Minimum266783
Maximum275265
Zeros0
Zeros (%)0.0%
Memory size180.1 KiB
2021-02-02T17:49:21.948237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum266783
5-th percentile267241
Q1268935
median270980.5
Q3273114.25
95-th percentile274839
Maximum275265
Range8482
Interquartile range (IQR)4179.25

Descriptive statistics

Standard deviation2431.573668
Coefficient of variation (CV)0.008971872183
Kurtosis-1.183937792
Mean271021.8803
Median Absolute Deviation (MAD)2082.5
Skewness0.01860150125
Sum6244344121
Variance5912550.505
MonotocityNot monotonic
2021-02-02T17:49:22.067529image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26881913
 
0.1%
26944913
 
0.1%
26679412
 
0.1%
27301412
 
0.1%
26924512
 
0.1%
27241512
 
0.1%
27525212
 
0.1%
27083112
 
0.1%
27422712
 
0.1%
27228612
 
0.1%
Other values (5496)22918
99.5%
ValueCountFrequency (%)
2667835
< 0.1%
2667843
 
< 0.1%
2667858
< 0.1%
2667884
 
< 0.1%
26679412
0.1%
ValueCountFrequency (%)
2752653
< 0.1%
2752642
 
< 0.1%
2752622
 
< 0.1%
2752615
< 0.1%
2752575
< 0.1%

tran_date
Categorical

HIGH CARDINALITY

Distinct1129
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Memory size180.1 KiB
2011-07-13
 
35
2013-12-21
 
33
2011-09-25
 
33
2011-10-23
 
33
2011-11-22
 
33
Other values (1124)
22873 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters230400
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row2014-02-28
2nd row2014-02-27
3rd row2014-02-24
4th row2014-02-24
5th row2014-02-23
ValueCountFrequency (%)
2011-07-1335
 
0.2%
2013-12-2133
 
0.1%
2011-09-2533
 
0.1%
2011-10-2333
 
0.1%
2011-11-2233
 
0.1%
2013-09-0332
 
0.1%
2012-11-2532
 
0.1%
2014-01-0131
 
0.1%
2012-05-0431
 
0.1%
2012-06-1031
 
0.1%
Other values (1119)22716
98.6%
2021-02-02T17:49:22.340513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2011-07-1335
 
0.2%
2013-12-2133
 
0.1%
2011-09-2533
 
0.1%
2011-10-2333
 
0.1%
2011-11-2233
 
0.1%
2013-09-0332
 
0.1%
2012-11-2532
 
0.1%
2014-01-0131
 
0.1%
2012-05-0431
 
0.1%
2012-06-1031
 
0.1%
Other values (1119)22716
98.6%

Most occurring characters

ValueCountFrequency (%)
051469
22.3%
149851
21.6%
-46080
20.0%
244381
19.3%
312825
 
5.6%
45137
 
2.2%
84233
 
1.8%
94154
 
1.8%
74151
 
1.8%
54149
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number184320
80.0%
Dash Punctuation46080
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
051469
27.9%
149851
27.0%
244381
24.1%
312825
 
7.0%
45137
 
2.8%
84233
 
2.3%
94154
 
2.3%
74151
 
2.3%
54149
 
2.3%
63970
 
2.2%
ValueCountFrequency (%)
-46080
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common230400
100.0%

Most frequent character per script

ValueCountFrequency (%)
051469
22.3%
149851
21.6%
-46080
20.0%
244381
19.3%
312825
 
5.6%
45137
 
2.2%
84233
 
1.8%
94154
 
1.8%
74151
 
1.8%
54149
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII230400
100.0%

Most frequent character per block

ValueCountFrequency (%)
051469
22.3%
149851
21.6%
-46080
20.0%
244381
19.3%
312825
 
5.6%
45137
 
2.2%
84233
 
1.8%
94154
 
1.8%
74151
 
1.8%
54149
 
1.8%

prod_subcat_code
Real number (ℝ≥0)

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.148784722
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size180.1 KiB
2021-02-02T17:49:22.434632image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.726196737
Coefficient of variation (CV)0.6060053987
Kurtosis-1.421979537
Mean6.148784722
Median Absolute Deviation (MAD)3
Skewness0.1931944433
Sum141668
Variance13.88454212
MonotocityNot monotonic
2021-02-02T17:49:22.533414image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
44000
17.4%
33065
13.3%
102991
13.0%
12948
12.8%
112057
8.9%
122027
8.8%
71043
 
4.5%
21007
 
4.4%
6989
 
4.3%
9985
 
4.3%
Other values (2)1928
8.4%
ValueCountFrequency (%)
12948
12.8%
21007
 
4.4%
33065
13.3%
44000
17.4%
5958
 
4.2%
ValueCountFrequency (%)
122027
8.8%
112057
8.9%
102991
13.0%
9985
 
4.3%
8970
 
4.2%

prod_cat_code
Real number (ℝ≥0)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.763498264
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size180.1 KiB
2021-02-02T17:49:22.622883image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.677091287
Coefficient of variation (CV)0.4456203164
Kurtosis-1.240404185
Mean3.763498264
Median Absolute Deviation (MAD)1
Skewness-0.2155958816
Sum86711
Variance2.812635186
MonotocityNot monotonic
2021-02-02T17:49:22.710600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
56066
26.3%
34895
21.2%
64126
17.9%
22996
13.0%
12960
12.8%
41997
 
8.7%
ValueCountFrequency (%)
12960
12.8%
22996
13.0%
34895
21.2%
41997
 
8.7%
56066
26.3%
ValueCountFrequency (%)
64126
17.9%
56066
26.3%
41997
 
8.7%
34895
21.2%
22996
13.0%

quantity
Real number (ℝ)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.435763889
Minimum-5
Maximum5
Zeros0
Zeros (%)0.0%
Memory size180.1 KiB
2021-02-02T17:49:22.797904image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-5
5-th percentile-3
Q11
median3
Q34
95-th percentile5
Maximum5
Range10
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.264326091
Coefficient of variation (CV)0.9296164135
Kurtosis1.903364817
Mean2.435763889
Median Absolute Deviation (MAD)1
Skewness-1.31429443
Sum56120
Variance5.127172644
MonotocityNot monotonic
2021-02-02T17:49:22.886310image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
54259
18.5%
14214
18.3%
34174
18.1%
24123
17.9%
44106
17.8%
-4455
 
2.0%
-5452
 
2.0%
-2436
 
1.9%
-1417
 
1.8%
-3404
 
1.8%
ValueCountFrequency (%)
-5452
2.0%
-4455
2.0%
-3404
1.8%
-2436
1.9%
-1417
1.8%
ValueCountFrequency (%)
54259
18.5%
44106
17.8%
34174
18.1%
24123
17.9%
14214
18.3%

rate
Real number (ℝ)

Distinct2551
Distinct (%)11.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean637.0949653
Minimum-1499
Maximum1500
Zeros0
Zeros (%)0.0%
Memory size180.1 KiB
2021-02-02T17:49:22.992347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-1499
5-th percentile-764
Q1312
median710
Q31109
95-th percentile1423
Maximum1500
Range2999
Interquartile range (IQR)797

Descriptive statistics

Standard deviation621.7273737
Coefficient of variation (CV)0.9758786485
Kurtosis1.510645611
Mean637.0949653
Median Absolute Deviation (MAD)399
Skewness-1.146009956
Sum14678668
Variance386544.9272
MonotocityNot monotonic
2021-02-02T17:49:23.105303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
67229
 
0.1%
55229
 
0.1%
47228
 
0.1%
61527
 
0.1%
61827
 
0.1%
92227
 
0.1%
104326
 
0.1%
19826
 
0.1%
88125
 
0.1%
17325
 
0.1%
Other values (2541)22771
98.8%
ValueCountFrequency (%)
-14992
< 0.1%
-14984
< 0.1%
-14972
< 0.1%
-14963
< 0.1%
-14943
< 0.1%
ValueCountFrequency (%)
150016
0.1%
149913
0.1%
149817
0.1%
14978
 
< 0.1%
149620
0.1%

total_amt
Real number (ℝ)

Distinct5764
Distinct (%)25.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2109.865226
Minimum-8270.925
Maximum8287.5
Zeros0
Zeros (%)0.0%
Memory size180.1 KiB
2021-02-02T17:49:23.227136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-8270.925
5-th percentile-1962.48
Q1762.45
median1756.95
Q33570.255
95-th percentile6469.775
Maximum8287.5
Range16558.425
Interquartile range (IQR)2807.805

Descriptive statistics

Standard deviation2505.610295
Coefficient of variation (CV)1.187568885
Kurtosis1.501467648
Mean2109.865226
Median Absolute Deviation (MAD)1252.5175
Skewness-0.3306547843
Sum48611294.81
Variance6278082.948
MonotocityNot monotonic
2021-02-02T17:49:23.347299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
676.2622
 
0.1%
729.322
 
0.1%
3049.822
 
0.1%
1021.0221
 
0.1%
1359.1521
 
0.1%
1591.221
 
0.1%
861.921
 
0.1%
397.821
 
0.1%
486.220
 
0.1%
1432.0820
 
0.1%
Other values (5754)22829
99.1%
ValueCountFrequency (%)
-8270.9251
< 0.1%
-8160.4251
< 0.1%
-8154.91
< 0.1%
-8143.851
< 0.1%
-8138.3251
< 0.1%
ValueCountFrequency (%)
8287.54
< 0.1%
8281.9755
< 0.1%
8276.453
 
< 0.1%
8270.9253
 
< 0.1%
8265.49
< 0.1%

store_type
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size180.1 KiB
e-Shop
9304 
MBR
4660 
Flagship store
4575 
TeleShop
4501 

Length

Max length14
Median length6
Mean length7.372482639
Min length3

Characters and Unicode

Total characters169862
Distinct characters19
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowe-Shop
2nd rowe-Shop
3rd rowTeleShop
4th rowe-Shop
5th rowTeleShop
ValueCountFrequency (%)
e-Shop9304
40.4%
MBR4660
20.2%
Flagship store4575
19.9%
TeleShop4501
19.5%
2021-02-02T17:49:23.574648image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-02T17:49:23.652433image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
e-shop9304
33.7%
mbr4660
16.9%
store4575
16.6%
flagship4575
16.6%
teleshop4501
16.3%

Most occurring characters

ValueCountFrequency (%)
e22881
13.5%
h18380
10.8%
o18380
10.8%
p18380
10.8%
S13805
 
8.1%
-9304
 
5.5%
s9150
 
5.4%
l9076
 
5.3%
M4660
 
2.7%
B4660
 
2.7%
Other values (9)41186
24.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter119122
70.1%
Uppercase Letter36861
 
21.7%
Dash Punctuation9304
 
5.5%
Space Separator4575
 
2.7%

Most frequent character per category

ValueCountFrequency (%)
e22881
19.2%
h18380
15.4%
o18380
15.4%
p18380
15.4%
s9150
 
7.7%
l9076
 
7.6%
a4575
 
3.8%
g4575
 
3.8%
i4575
 
3.8%
t4575
 
3.8%
ValueCountFrequency (%)
S13805
37.5%
M4660
 
12.6%
B4660
 
12.6%
R4660
 
12.6%
F4575
 
12.4%
T4501
 
12.2%
ValueCountFrequency (%)
-9304
100.0%
ValueCountFrequency (%)
4575
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin155983
91.8%
Common13879
 
8.2%

Most frequent character per script

ValueCountFrequency (%)
e22881
14.7%
h18380
11.8%
o18380
11.8%
p18380
11.8%
S13805
8.9%
s9150
 
5.9%
l9076
 
5.8%
M4660
 
3.0%
B4660
 
3.0%
R4660
 
3.0%
Other values (7)31951
20.5%
ValueCountFrequency (%)
-9304
67.0%
4575
33.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII169862
100.0%

Most frequent character per block

ValueCountFrequency (%)
e22881
13.5%
h18380
10.8%
o18380
10.8%
p18380
10.8%
S13805
 
8.1%
-9304
 
5.5%
s9150
 
5.4%
l9076
 
5.3%
M4660
 
2.7%
B4660
 
2.7%
Other values (9)41186
24.2%

prod_cat
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size180.1 KiB
Books
6066 
Electronics
4895 
Home and kitchen
4126 
Footwear
2996 
Clothing
2960 

Length

Max length16
Median length8
Mean length8.933463542
Min length4

Characters and Unicode

Total characters205827
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowClothing
2nd rowElectronics
3rd rowBooks
4th rowHome and kitchen
5th rowBooks
ValueCountFrequency (%)
Books6066
26.3%
Electronics4895
21.2%
Home and kitchen4126
17.9%
Footwear2996
13.0%
Clothing2960
12.8%
Bags1997
 
8.7%
2021-02-02T17:49:23.894559image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-02T17:49:23.981226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
books6066
19.4%
electronics4895
15.6%
home4126
13.2%
and4126
13.2%
kitchen4126
13.2%
footwear2996
9.6%
clothing2960
9.5%
bags1997
 
6.4%

Most occurring characters

ValueCountFrequency (%)
o30105
14.6%
e16143
 
7.8%
n16107
 
7.8%
t14977
 
7.3%
c13916
 
6.8%
s12958
 
6.3%
i11981
 
5.8%
k10192
 
5.0%
a9119
 
4.4%
8252
 
4.0%
Other values (12)62077
30.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter174535
84.8%
Uppercase Letter23040
 
11.2%
Space Separator8252
 
4.0%

Most frequent character per category

ValueCountFrequency (%)
o30105
17.2%
e16143
9.2%
n16107
9.2%
t14977
8.6%
c13916
8.0%
s12958
7.4%
i11981
 
6.9%
k10192
 
5.8%
a9119
 
5.2%
r7891
 
4.5%
Other values (6)31146
17.8%
ValueCountFrequency (%)
B8063
35.0%
E4895
21.2%
H4126
17.9%
F2996
 
13.0%
C2960
 
12.8%
ValueCountFrequency (%)
8252
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin197575
96.0%
Common8252
 
4.0%

Most frequent character per script

ValueCountFrequency (%)
o30105
15.2%
e16143
 
8.2%
n16107
 
8.2%
t14977
 
7.6%
c13916
 
7.0%
s12958
 
6.6%
i11981
 
6.1%
k10192
 
5.2%
a9119
 
4.6%
B8063
 
4.1%
Other values (11)54014
27.3%
ValueCountFrequency (%)
8252
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII205827
100.0%

Most frequent character per block

ValueCountFrequency (%)
o30105
14.6%
e16143
 
7.8%
n16107
 
7.8%
t14977
 
7.3%
c13916
 
6.8%
s12958
 
6.3%
i11981
 
5.8%
k10192
 
5.0%
a9119
 
4.4%
8252
 
4.0%
Other values (12)62077
30.2%

prod_subcat
Categorical

Distinct18
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size180.1 KiB
Women
3046 
Mens
2910 
Kids
1997 
Tools
 
1061
Fiction
 
1043
Other values (13)
12983 

Length

Max length19
Median length6
Mean length6.966102431
Min length3

Characters and Unicode

Total characters160499
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWomen
2nd rowComputers
3rd rowDIY
4th rowBath
5th rowDIY
ValueCountFrequency (%)
Women3046
 
13.2%
Mens2910
 
12.6%
Kids1997
 
8.7%
Tools1061
 
4.6%
Fiction1043
 
4.5%
Kitchen1036
 
4.5%
Children1035
 
4.5%
Comics1030
 
4.5%
Mobiles1030
 
4.5%
Bath1022
 
4.4%
Other values (8)7830
34.0%
2021-02-02T17:49:24.196767image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
women3046
 
11.8%
mens2910
 
11.2%
kids1997
 
7.7%
tools1061
 
4.1%
fiction1043
 
4.0%
kitchen1036
 
4.0%
children1035
 
4.0%
comics1030
 
4.0%
mobiles1030
 
4.0%
bath1022
 
3.9%
Other values (11)10704
41.3%

Most occurring characters

ValueCountFrequency (%)
i16074
 
10.0%
n15982
 
10.0%
e14858
 
9.3%
o14109
 
8.8%
s12918
 
8.0%
c7014
 
4.4%
m6985
 
4.4%
d6854
 
4.3%
a6850
 
4.3%
l5066
 
3.2%
Other values (23)53789
33.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter129631
80.8%
Uppercase Letter26991
 
16.8%
Space Separator2874
 
1.8%
Dash Punctuation1003
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
i16074
12.4%
n15982
12.3%
e14858
11.5%
o14109
10.9%
s12918
10.0%
c7014
 
5.4%
m6985
 
5.4%
d6854
 
5.3%
a6850
 
5.3%
l5066
 
3.9%
Other values (8)22921
17.7%
ValueCountFrequency (%)
C4008
14.8%
M3940
14.6%
F3053
11.3%
W3046
11.3%
K3033
11.2%
A2888
10.7%
T1061
 
3.9%
B1022
 
3.8%
N1003
 
3.7%
D989
 
3.7%
Other values (3)2948
10.9%
ValueCountFrequency (%)
2874
100.0%
ValueCountFrequency (%)
-1003
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin156622
97.6%
Common3877
 
2.4%

Most frequent character per script

ValueCountFrequency (%)
i16074
 
10.3%
n15982
 
10.2%
e14858
 
9.5%
o14109
 
9.0%
s12918
 
8.2%
c7014
 
4.5%
m6985
 
4.5%
d6854
 
4.4%
a6850
 
4.4%
l5066
 
3.2%
Other values (21)49912
31.9%
ValueCountFrequency (%)
2874
74.1%
-1003
 
25.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII160499
100.0%

Most frequent character per block

ValueCountFrequency (%)
i16074
 
10.0%
n15982
 
10.0%
e14858
 
9.3%
o14109
 
8.8%
s12918
 
8.0%
c7014
 
4.4%
m6985
 
4.4%
d6854
 
4.3%
a6850
 
4.3%
l5066
 
3.2%
Other values (23)53789
33.5%

dob
Categorical

HIGH CARDINALITY

Distinct3987
Distinct (%)17.3%
Missing0
Missing (%)0.0%
Memory size180.1 KiB
1982-09-17
 
32
1988-12-27
 
32
1974-02-25
 
27
1972-03-20
 
25
1970-06-09
 
24
Other values (3982)
22900 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters230400
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique200 ?
Unique (%)0.9%

Sample

1st row1981-09-26
2nd row1973-05-11
3rd row1992-07-27
4th row1981-06-08
5th row1992-07-27
ValueCountFrequency (%)
1982-09-1732
 
0.1%
1988-12-2732
 
0.1%
1974-02-2527
 
0.1%
1972-03-2025
 
0.1%
1970-06-0924
 
0.1%
1991-11-1824
 
0.1%
1977-05-2623
 
0.1%
1983-03-0822
 
0.1%
1981-12-2022
 
0.1%
1988-07-2122
 
0.1%
Other values (3977)22787
98.9%
2021-02-02T17:49:24.445969image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1982-09-1732
 
0.1%
1988-12-2732
 
0.1%
1974-02-2527
 
0.1%
1972-03-2025
 
0.1%
1970-06-0924
 
0.1%
1991-11-1824
 
0.1%
1977-05-2623
 
0.1%
1983-03-0822
 
0.1%
1981-12-2022
 
0.1%
1988-07-2122
 
0.1%
Other values (3977)22787
98.9%

Most occurring characters

ValueCountFrequency (%)
-46080
20.0%
145615
19.8%
932481
14.1%
030634
13.3%
217615
 
7.6%
716602
 
7.2%
816286
 
7.1%
56512
 
2.8%
66258
 
2.7%
46226
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number184320
80.0%
Dash Punctuation46080
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
145615
24.7%
932481
17.6%
030634
16.6%
217615
 
9.6%
716602
 
9.0%
816286
 
8.8%
56512
 
3.5%
66258
 
3.4%
46226
 
3.4%
36091
 
3.3%
ValueCountFrequency (%)
-46080
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common230400
100.0%

Most frequent character per script

ValueCountFrequency (%)
-46080
20.0%
145615
19.8%
932481
14.1%
030634
13.3%
217615
 
7.6%
716602
 
7.2%
816286
 
7.1%
56512
 
2.8%
66258
 
2.7%
46226
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII230400
100.0%

Most frequent character per block

ValueCountFrequency (%)
-46080
20.0%
145615
19.8%
932481
14.1%
030634
13.3%
217615
 
7.6%
716602
 
7.2%
816286
 
7.1%
56512
 
2.8%
66258
 
2.7%
46226
 
2.7%

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing9
Missing (%)< 0.1%
Memory size180.1 KiB
M
11804 
F
11227 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters23031
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowM
5th rowM
ValueCountFrequency (%)
M11804
51.2%
F11227
48.7%
(Missing)9
 
< 0.1%
2021-02-02T17:49:24.632542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-02T17:49:24.691715image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
m11804
51.3%
f11227
48.7%

Most occurring characters

ValueCountFrequency (%)
M11804
51.3%
F11227
48.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter23031
100.0%

Most frequent character per category

ValueCountFrequency (%)
M11804
51.3%
F11227
48.7%

Most occurring scripts

ValueCountFrequency (%)
Latin23031
100.0%

Most frequent character per script

ValueCountFrequency (%)
M11804
51.3%
F11227
48.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII23031
100.0%

Most frequent character per block

ValueCountFrequency (%)
M11804
51.3%
F11227
48.7%

city_code
Real number (ℝ≥0)

Distinct10
Distinct (%)< 0.1%
Missing8
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5.483067037
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size180.1 KiB
2021-02-02T17:49:24.751280image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q38
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.863330666
Coefficient of variation (CV)0.5222133246
Kurtosis-1.218240634
Mean5.483067037
Median Absolute Deviation (MAD)2
Skewness0.02075893494
Sum126286
Variance8.198662505
MonotocityNot monotonic
2021-02-02T17:49:24.831302image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
42422
10.5%
32410
10.5%
52357
10.2%
72356
10.2%
102333
10.1%
82328
10.1%
22268
9.8%
12255
9.8%
92176
9.4%
62127
9.2%
(Missing)8
 
< 0.1%
ValueCountFrequency (%)
12255
9.8%
22268
9.8%
32410
10.5%
42422
10.5%
52357
10.2%
ValueCountFrequency (%)
102333
10.1%
92176
9.4%
82328
10.1%
72356
10.2%
62127
9.2%

Interactions

2021-02-02T17:49:12.035115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:12.155702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:12.255735image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:12.376478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:12.520728image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:12.652618image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:12.783179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:12.918060image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:13.037764image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:13.142234image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:13.237239image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:13.344827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:13.447406image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:13.545610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:13.752562image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:13.873926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:13.983832image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:14.093773image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:14.194843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:14.296115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:14.408198image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:14.514595image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:14.613988image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:14.710799image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:14.803480image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:14.908581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:15.010249image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:15.109293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:15.214787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:15.354182image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:15.528245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:15.661930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:15.762856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:15.883274image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:15.993653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:16.192237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:16.353735image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:16.459537image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:16.568255image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:16.680888image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:16.815810image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:16.922139image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:17.018367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:17.109390image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:17.204211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:17.298194image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:17.399909image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:17.618308image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:17.738574image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:17.853317image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:17.963335image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:18.071227image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:18.183840image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:18.293669image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:18.404122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:18.518115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:18.622102image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:18.733590image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:18.841761image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:18.946576image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.058163image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.161850image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.261589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.371637image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.475647image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.575799image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.676181image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.780306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.887382image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:19.992274image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:20.092949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-02T17:49:20.200473image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-02T17:49:24.921919image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-02T17:49:25.076297image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-02T17:49:25.423061image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-02T17:49:25.592453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-02-02T17:49:25.744558image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-02T17:49:20.430478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-02T17:49:20.727378image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-02T17:49:20.923491image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-02T17:49:21.027112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

transaction_idUnnamed: 0cust_idtran_dateprod_subcat_codeprod_cat_codequantityratetotal_amtstore_typeprod_catprod_subcatdobgendercity_code
08071219043802703512014-02-2811-5-772-4265.300e-ShopClothingWomen1981-09-26M5.0
12925845350812703842014-02-2753-5-1497-8270.925e-ShopElectronicsComputers1973-05-11F8.0
25175072494722734202014-02-2465-2-791-1748.110TeleShopBooksDIY1992-07-27M8.0
39327488071932715092014-02-24116-3-1363-4518.345e-ShopHome and kitchenBath1981-06-08M3.0
45175072494742734202014-02-2365-2-791-1748.110TeleShopBooksDIY1992-07-27M8.0
59743903911952723572014-02-2383-2-824-1821.040TeleShopElectronicsPersonal Appliances1982-10-09F6.0
64564983809062736672014-02-22116-1-1450-1602.250e-ShopHome and kitchenBath1981-05-29M9.0
72264366793072714892014-02-22126-1-1225-1353.625TeleShopHome and kitchenTools1971-04-21M9.0
87979237294382751082014-02-2231-3-908-3010.020MBRClothingKids1971-11-04F8.0
95007672859892690142014-02-2183-4-581-2568.020e-ShopElectronicsPersonal Appliances1979-11-27F3.0

Last rows

transaction_idUnnamed: 0cust_idtran_dateprod_subcat_codeprod_cat_codequantityratetotal_amtstore_typeprod_catprod_subcatdobgendercity_code
2303049882891062230432719822011-01-25105413305878.600e-ShopBooksNon-Fiction1976-08-10M8.0
2303114787475597230442739822011-01-254359695353.725e-ShopElectronicsMobiles1991-10-12M4.0
2303250691119572230452730312011-01-2565111481268.540TeleShopBooksDIY1980-01-17F8.0
2303340893803228230462720492011-01-25116310773570.255e-ShopHome and kitchenBath1975-06-28F6.0
2303430856003613230472668662011-01-25422444981.240TeleShopFootwearKids1974-04-18M4.0
2303594340757522230482745502011-01-25125112641396.720e-ShopBooksAcademic1972-02-21M7.0
2303689780862956230492700222011-01-25411677748.085e-ShopClothingMens1984-04-27M9.0
2303785115299378230502710202011-01-2526410524649.840MBRHome and kitchenFurnishing1976-06-20M8.0
2303872870271171230512709112011-01-25115311423785.730TeleShopBooksChildren1970-05-22M2.0
2303977960931771230522719612011-01-251151447493.935TeleShopBooksChildren1982-01-15M1.0